object pose estimation
Supplementary Material for "SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation "
SE(3) diffusion model for point cloud registration can be derived as below. By inserting Eq. 5 into the variational lower bound 4, we can further rewrite the variational lower As demonstrated in our main paper, we utilize the Lie algebra for randomly sampling the desired perturbation transformation to randomize our SE(3) diffusion process. This innovative registration framework exhibits promising registration performance. Learning 6d object pose estimation using 3d object coordinates.
- Europe > Switzerland (0.05)
- Asia > China > Jiangsu Province > Nanjing (0.05)
Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset
While a lot of recent efforts have been made on generalizing pose estimation to novel object instances within the same category, namely category-level 6D pose estimation, it is still restricted in constrained environments given the limited number of annotated data. In this paper, we collect Wild6D, a new unlabeled RGBD object video dataset with diverse instances and backgrounds. We utilize this data to generalize category-level 6D object pose estimation in the wild with semi-supervised learning. We propose a new model, called Rendering for Pose estimation network RePoNet), that is jointly trained using the free ground-truths with the synthetic data, and a silhouette matching objective function on the real-world data. Without using any 3D annotations on real data, our method outperforms state-of-the-art methods on the previous dataset and our Wild6D test set (with manual annotations for evaluation) by a large margin.
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation
In this paper, we introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios. Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud to obtain a precise alignment with the model point cloud. Training our framework involves two operations: An SE(3) diffusion process and an SE(3) reverse process.
Is Image-based Object Pose Estimation Ready to Support Grasping?
Joyce, Eric C., Zhao, Qianwen, Burgdorfer, Nathaniel, Wang, Long, Mordohai, Philippos
We present a framework for evaluating 6-DoF instance-level object pose estimators, focusing on those that require a single RGB (not RGB-D) image as input. Besides gaining intuition about how accurate these estimators are, we are interested in the degree to which they can serve as the sole perception mechanism for robotic grasping. To assess this, we perform grasping trials in a physics-based simulator, using image-based pose estimates to guide a parallel gripper and an underactuated robotic hand in picking up 3D models of objects. Our experiments on a subset of the BOP (Benchmark for 6D Object Pose Estimation) dataset compare five open-source object pose estimators and provide insights that were missing from the literature.
- North America > United States > Utah (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.04)
SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in Robotics
Hönig, Peter, Thalhammer, Stefan, Weibel, Jean-Baptiste, Hirschmanner, Matthias, Vincze, Markus
Abstract-- Object manipulation requires accurate object pose estimation. In open environments, robots encounter unknown objects, which requires semantic understanding in order to generalize both to known categories and beyond. T o resolve this challenge, we present SCOPE, a diffusion-based category-level object pose estimation model that eliminates the need for discrete category labels by leveraging DINOv2 features as continuous semantic priors. By combining these DINOv2 features with photorealistic training data and a noise model for point normals, we reduce the Sim2Real gap in category-level object pose estimation. Furthermore, injecting the continuous semantic priors via cross-attention enables SCOPE to learn canonicalized object coordinate systems across object instances beyond the distribution of known categories. SCOPE outperforms the current state of the art in synthetically trained category-level object pose estimation, achieving a relative improvement of 31.9% on the 5 Additional experiments on two instance-level datasets demonstrate generalization beyond known object categories, enabling grasping of unseen objects from unknown categories with a success rate of up to 100%. I. INTRODUCTION Autonomous manipulation and scene understanding require accurate object poses [3], with the choice of algorithm depending on the available object priors.
Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception
Joyce, Eric C., Zhao, Qianwen, Burgdorfer, Nathaniel, Wang, Long, Mordohai, Philippos
--Deep object pose estimators are notoriously overconfident. A grasping agent that both estimates the 6-DoF pose of a target object and predicts the uncertainty of its own estimate could avoid task failure by choosing not to act under high uncertainty. Even though object pose estimation improves and uncertainty quantification research continues to make strides, few studies have connected them to the downstream task of robotic grasping. We propose a method for training lightweight, deep networks to predict whether a grasp guided by an image-based pose estimate will succeed before that grasp is attempted. We generate training data for our networks via object pose estimation on real images and simulated grasping. We also find that, despite high object variability in grasping trials, networks benefit from training on all objects jointly, suggesting that a diverse variety of objects can nevertheless contribute to the same goal. Remarkable progress in object pose estimation from single RGB images has been made in the past few years [1]-[4], primarily driven by deep learning and the ability to reduce the so-called sim2real gap . This has enabled end-to-end system training on large amounts of synthetic data with precise ground truth. Consider for example the pose estimates illustrated in Figure 1. These were made by current methods, yet all four caused grasping attempts to fail when used as guides. Motivated by this disconnect between pose evaluation and success in downstream grasping, we propose an approach to estimate the likelihood for success before a grasp is actually attempted.
- North America > United States > Utah (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Object Pose Estimation by Camera Arm Control Based on the Next Viewpoint Estimation
Mizuno, Tomoki, Yabashi, Kazuya, Tasaki, Tsuyoshi
We have developed a new method to estimate a Next Viewpoint (NV) which is effective for pose estimation of simple-shaped products for product display robots in retail stores. Pose estimation methods using Neural Networks (NN) based on an RGBD camera are highly accurate, but their accuracy significantly decreases when the camera acquires few texture and shape features at a current view point. However, it is difficult for previous mathematical model-based methods to estimate effective NV which is because the simple shaped objects have few shape features. Therefore, we focus on the relationship between the pose estimation and NV estimation. When the pose estimation is more accurate, the NV estimation is more accurate. Therefore, we develop a new pose estimation NN that estimates NV simultaneously. Experimental results showed that our NV estimation realized a pose estimation success rate 77.3\%, which was 7.4pt higher than the mathematical model-based NV calculation did. Moreover, we verified that the robot using our method displayed 84.2\% of products.
- North America > Canada > Alberta (0.14)
- North America > United States > Texas (0.04)
- Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
- Government > Military (0.40)
- Retail (0.34)
ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation
Li, Hongyu, Akl, James, Sridhar, Srinath, Brady, Tye, Padir, Taskin
-- Object 6D pose estimation is a critical challenge in robotics, particularly for manipulation tasks. While prior research combining visual and tactile (visuotactile) information has shown promise, these approaches often struggle with generalization due to the limited availability of visuotactile data. In this paper, we introduce ViT a-Zero, a zero-shot visuotactile pose estimation framework. Our key innovation lies in leveraging a visual model as its backbone and performing feasibility checking and test-time optimization based on physical constraints derived from tactile and proprioceptive observations. Specifically, we model the gripper-object interaction as a spring-mass system, where tactile sensors induce attractive forces, and proprioception generates repulsive forces. We validate our framework through experiments on a real-world robot setup, demonstrating its effectiveness across representative visual backbones and manipulation scenarios, including grasping, object picking, and bimanual handover . Compared to the visual models, our approach overcomes some drastic failure modes while tracking the in-hand object pose. In our experiments, our approach shows an average increase of 55% in AUC of ADD-S and 60% in ADD, along with an 80% lower position error compared to FoundationPose.
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (2 more...)
Improving 6D Object Pose Estimation of metallic Household and Industry Objects
Pöllabauer, Thomas, Gasser, Michael, Wirth, Tristan, Berkei, Sarah, Knauthe, Volker, Kuijper, Arjan
6D object pose estimation suffers from reduced accuracy when applied to metallic objects. We set out to improve the state-of-the-art by addressing challenges such as reflections and specular highlights in industrial applications. Our novel BOP-compatible dataset, featuring a diverse set of metallic objects (cans, household, and industrial items) under various lighting and background conditions, provides additional geometric and visual cues. We demonstrate that these cues can be effectively leveraged to enhance overall performance. To illustrate the usefulness of the additional features, we improve upon the GDRNPP algorithm by introducing an additional keypoint prediction and material estimator head in order to improve spatial scene understanding. Evaluations on the new dataset show improved accuracy for metallic objects, supporting the hypothesis that additional geometric and visual cues can improve learning.